Scientists at ShanghaiTech University have developed CLAY, an AI model that can generate detailed 3D objects from text or 2D images, outperforming earlier technologies in quality, diversity, and generation speed. At its core, CLAY includes a multi-resolution variational autoencoder and a diffusion transformer, allowing direct processing of 3D content without conversion. By training on over 500,000 3D models, CLAY can produce objects ranging from everyday items to complex creatures and achieve precise control through additional inputs, generating urban scenes and reconstructing 3D models from sketches.